Topic representation: Finding more representative words in topic models
نویسندگان
چکیده
منابع مشابه
Finding Topic Words for Hierarchical Summarization (DRAFT)
! "$#% & ' ( *) ' ,+! ./. *0 ( *) ' ,+ 1"2 ! 3) / 4#% . *) ' ,5768 2) :91 9 :;< =" ?>1 ! ./. ( *) @ ) ./ A #B 94 ' C) D ' 1 ' E./ F"4 1"E ! =) G" > H) ' E) I49 = / ?;J) LKM N#% N 4) 0 . *) HOE *) ) '9 = FO2 9 9 HO4 / 9 0 ) ?) B ' H) .E+M;: < 9 9 I4 . ) ' #P) :QN . 0 ) SR4 )DTU ' .E5WV< X H) .Y Z/ M) HO[ *) ) ? ./ \ " &) = L 1 ]./ F" ^5U_X N './9 3) N ; ) ? KM `) a94 F E./ ?) F" 29 9 ' ! "[#% b ...
متن کاملThe Use of Topic Representative Words in Text Categorization
We present a novel way to identify the representative words that are able to capture the topic of documents for use in text categorization. Our intuition is that not all word n-grams equally represent the topic of a document, and thus using all of them can potentially dilute the feature space. Hence, our aim is to investigate methods for identifying good indexing words, and empirically evaluate...
متن کاملTopic Models with Logical Constraints on Words
This paper describes a simple method to achieve logical constraints on words for topic models based on a recently developed topic modeling framework with Dirichlet forest priors (LDA-DF). Logical constraints mean logical expressions of pairwise constraints, Must-links and Cannot-Links, used in the literature of constrained clustering. Our method can not only cover the original constraints of th...
متن کاملBuilding Topic Models Based on Anchor Words
Suppose you were given a stack of documents, such as all of the articles published in a particular newspaper, and your goal was to make sense of this data, to determine topics that this data may be made up from. To frame this as an unsupervised learning problem, suppose the documents were written in a foreign language and came from a foreign planet. By understanding topics that these documents ...
متن کاملTopic extraction with multiple topic-words in broadcast-news speech
This paper reports on topic extraction in Japanese broadcastnews speech. We studied, using continuous speech recognition, the extraction of several topic-words from broadcast-news. A combination of multiple topic-words represents the content of the news. This is a more detailed and more flexible approach than using a single word or a single category. A topic-extraction model shows the degree of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition Letters
سال: 2019
ISSN: 0167-8655
DOI: 10.1016/j.patrec.2019.01.018